Search CORE

115 research outputs found

Research Interests Databases

Author: George Kollios
Publication venue
Publication date
Field of study

Discovering Clusters in Motion Time-Series Data

Author: Alon Jonathan
Sclaroff Stan
Kollios George
Pavlovic Vladimir
Publication venue: Boston University Computer Science Department
Publication date: 01/01/2002
Field of study

A new approach is proposed for clustering time-series data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectation-maximization (EM) framework. Previous approaches for HMM-based clustering employ a k-means formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EM-based approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMM-based motion clustering in a number of applications.Office of Naval Research (N000140310108, N000140110444); National Science Foundation (IIS-0208876, CAREER Award 0133825

Boston University Institutional Repository (OpenBU)

Efficient Correlation Clustering Methods for Large Consensus Clustering Instances

Author: Cordner Nathan
Kollios George
Publication venue
Publication date: 07/07/2023
Field of study

Consensus clustering (or clustering aggregation) inputs

k

partitions of a given ground set

V

, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either

k

V

gets large. In this paper we provide practical run time improvements for correlation clustering solvers when

V

is large. We reduce the time complexity of Pivot from

O(|V|^2 k)

O(|V| k)

, and its space complexity from

O(|V|^2)

O(|V| k)

-- a significant savings since in practice

k

is much less than

|V|

. We also analyze a sampling method for these algorithms when

k

is large, bridging the gap between running Pivot on the full set of input partitions (an expected 1.57-approximation) and choosing a single input partition at random (an expected 2-approximation). We show experimentally that algorithms like Pivot do obtain quality clustering results in practice even on small samples of input partitions

arXiv.org e-Print Archive

A Comparative Evaluation of Order-Revealing Encryption Schemes and Secure Range-Query Protocols

Author: Dmytro Bogatov
George Kollios
Leonid Reyzin
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 20/06/2019
Field of study

Database query evaluation over encrypted data can allow database users to maintain the privacy of their data while outsourcing data processing. Order-Preserving Encryption (OPE) and Order-Revealing Encryption (ORE) were designed to enable efficient query execution, but provide only partial privacy. More private protocols, based on Searchable Symmetric Encryption (SSE), Oblivious RAM (ORAM) or custom encrypted data structures, have also been designed. In this paper, we develop a framework to provide the first comprehensive comparison among a number of range query protocols that ensure varying levels of privacy of user data. We evaluate five ORE-based and five generic range query protocols. We analyze and compare them both theoretically and experimentally and measure their performance over database indexing and query evaluation. We report not only execution time but also I/O performance, communication amount, and usage of cryptographic primitive operations. Our comparison reveals some interesting insights concerning the relative security and performance of these approaches in database settings

Cryptology ePrint Archive

Generalized Methods for Discovering Frequent Poly-Regions in DNA

Author: Benson Gary
Kollios George
Papapetrou Panagiotis
Publication venue: Boston University Computer Science Department
Publication date: 21/10/2008
Field of study

The problem of discovering frequent poly-regions (i.e. regions of high occurrence of a set of items or patterns of a given alphabet) in a sequence is studied, and three efficient approaches are proposed to solve it. The first one is entropy-based and applies a recursive segmentation technique that produces a set of candidate segments which may potentially lead to a poly-region. The key idea of the second approach is the use of a set of sliding windows over the sequence. Each sliding window covers a sequence segment and keeps a set of statistics that mainly include the number of occurrences of each item or pattern in that segment. Combining these statistics efficiently yields the complete set of poly-regions in the given sequence. The third approach applies a technique based on the majority vote, achieving linear running time with a minimal number of false negatives. After identifying the poly-regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a poly-region). An efficient algorithm for mining frequent arrangements of intervals is applied to the converted sequence to discover frequently occurring arrangements of poly-regions in different parts of DNA, including coding regions. The proposed algorithms are tested on various DNA sequences producing results of significant biological meaning

Boston University Institutional Repository (OpenBU)